-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add typing for pandas.core.frame.dropna #38968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes GH38948.
mypy justifiably requires an explicit `return None` when a function return is typed as `Optional`
Looking to add a "whatsnew" comment to the docs and see there's a tracker for v1.2.1 and v1.3.0, both of which aren't released. Which should I add to? |
It's also worth noting that pandas/core/computation/parsing.py:43: error: unused 'type: ignore' comment
pandas/io/common.py:182: error: Only @runtime_checkable protocols can be used with instance and class checks [misc]
pandas/io/common.py:491: error: Unsupported operand types for + ("List[Optional[str]]" and "List[str]") [operator]
pandas/_testing/contexts.py:128: error: Argument 1 to "close" has incompatible type "Optional[int]"; expected "int" [arg-type]
pandas/core/aggregation.py:485: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]" [type-var]
pandas/core/aggregation.py:741: error: Value of type variable "_LT" of "sorted" cannot be "Optional[Hashable]" [type-var]
pandas/core/groupby/ops.py:286: error: Incompatible types in assignment (expression has type "Index", variable has type "Iterator[Any]") [assignment]
pandas/core/groupby/ops.py:286: note: 'Index' is missing following 'Iterator' protocol member:
pandas/core/groupby/ops.py:286: note: __next__
pandas/core/reshape/pivot.py:198: error: Incompatible types in assignment (expression has type "Optional[DataFrame]", variable has type "DataFrame") [assignment]
pandas/tests/groupby/test_categorical.py:1676: error: Item "None" of "Optional[DataFrame]" has no attribute "astype" [union-attr]
pandas/tests/window/conftest.py:119: error: unused 'type: ignore' comment
pandas/tests/window/conftest.py:335: error: unused 'type: ignore' comment
pandas/tests/arrays/string_/test_string.py:25: error: unused 'type: ignore' comment
Found 12 errors in 9 files (checked 1160 source files) |
@mzeitlin11 Any feedback would be appreciated - I'm not understanding why I'm seeing these build fails, particularly as the bits that are failing don't seem to be related to any changes I made! |
As the `Optional[DataFrame]` return type creates a union of incompatible types (DataFrame and None), we overload the signature on the value of the `inplace` parameter, to force the return of a DataFrame when inplace is false. Fixes GH38948
For GH38948
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK Literal
is a Python 3.8 feature whereas we still support Python 3.7 so so I think we want to use a workaround for now.
Also this doesn't actually pass mypy at the moment:
pandas/core/frame.py:5076: error: Overloaded function signatures 1 and 2 overlap with incompatible return types [misc]
(although I'm not clear on why that is...)
pandas/core/frame.py
Outdated
@@ -5070,14 +5070,38 @@ def notna(self) -> DataFrame: | |||
def notnull(self) -> DataFrame: | |||
return ~self.isna() | |||
|
|||
# Overload function signature to prevent union of conflicting types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop this comment
pandas/core/frame.py
Outdated
how: str = ..., | ||
thresh: Optional[int] = ..., | ||
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ..., | ||
inplace: Literal[False] = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you be specifying the default value here? (I'm not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we were typing at the time :P My understanding of overloading (and this might be very wrong) is that you only specify the defaults in overloads for those parameters you are overloading, in this case inplace
.
See below for an example taken from pandas.core.frame.reset_index
.
Lines 4812 to 4843 in 67d4cae
@overload | |
# https://github.com/python/mypy/issues/6580 | |
# Overloaded function signatures 1 and 2 overlap with incompatible return types | |
def reset_index( # type: ignore[misc] | |
self, | |
level: Optional[Union[Hashable, Sequence[Hashable]]] = ..., | |
drop: bool = ..., | |
inplace: Literal[False] = ..., | |
col_level: Hashable = ..., | |
col_fill: Label = ..., | |
) -> DataFrame: | |
... | |
@overload | |
def reset_index( | |
self, | |
level: Optional[Union[Hashable, Sequence[Hashable]]] = ..., | |
drop: bool = ..., | |
inplace: Literal[True] = ..., | |
col_level: Hashable = ..., | |
col_fill: Label = ..., | |
) -> None: | |
... | |
def reset_index( | |
self, | |
level: Optional[Union[Hashable, Sequence[Hashable]]] = None, | |
drop: bool = False, | |
inplace: bool = False, | |
col_level: Hashable = 0, | |
col_fill: Label = "", | |
) -> Optional[DataFrame]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah in this example the defaults are not specified in the overloaded signatures so I think you should get rid of those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That example also explains the mypy error (it's caused to a mypy bug). So to get mypy passing you need to add a type ignore (and a comment with the link to mypy issue and the error message above the overloaded signature as done in the reset_index
example)
Those things being done I think this will be good to go in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Changes forthcoming...
So overloading the function signature to prevent type collision solved most of the bugs. I'm left with a mypy error about overloaded signature overlap with conflicting return types, which is not present when I run |
We generally don't add a whatsnew for a pure typing PR |
not needed for typing |
GH38948 pull request review
yeah that's fine (esp as it's already imported) |
To pass mypy checks for GH38948
@@ -5070,14 +5070,38 @@ def notna(self) -> DataFrame: | |||
def notnull(self) -> DataFrame: | |||
return ~self.isna() | |||
|
|||
@overload | |||
# https://github.com/python/mypy/issues/6580 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this comment has to go above the decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't in pandas.core.frame.reset_index
, which doesn't seem to cause any problems. Also, it passed the typing portion of CI/Checks build, so I think we are fine. Happy to move it if there is a style preference though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a preference since it passes CI
Happy to see it approved! Let me know if I can help with anything else! |
FYI there will be at least one more round of review from someone else ;)
There are some issues in the tracker but we don't usually open issues unless there's something specific to discuss. We'd love for you to chime in and help to increase our coverage. Focused PRs like this one are best. For easy annotations (bool, str, etc.) we'll usually do a pass through a file just doing those and leaving the hard ones for dedicated PRs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rbpatt2019 for the PR. small targeted PRs adding type hints always welcome.
axis: Axis = ..., | ||
how: str = ..., | ||
thresh: Optional[int] = ..., | ||
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we now have an IndexLabel
alias in pandas._typing that could be used for subset.
subset: Optional[Union[Hashable, Sequence[Hashable]]] = ..., | |
subset: Optional[IndexLabel] = ..., |
inplace: bool = False, | ||
): | ||
) -> Optional[DataFrame]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you used #37137 as a example. However the motivation for that PR was to remove an assert df is not None
from the codebase and the return type of Optional[DataFrame]
not touched.
Not needed to be done in this PR, but just FYI if the motivation for adding the types is for the public api, then the return type should be the same type as self for subclassed DataFrames.
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
@rbpatt2019 can you update to comments and merge master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi - is this still active? If so, please see set_axis
and reset_index
for examples of these kinds of overloads (or, this blog post, which is still work-in-progress but does have some examples of how to do this)
closing as stale. @rbpatt2019 please ping if you want to pick this up again. |
This PR adds type hinting for pandas.core.frame.dropna. Additionally, an explicit
return None
was added to the if/else block for in-place changes to meet mypy standards and make the types explicit.I haven't yet run the tests, because I didn't modify any source code. Once/if the CI builds pass, I'll check it off the list.